Towards Efficient Indexing of Arbitrary Similarity
نویسندگان
چکیده
The popularity of similarity search expanded with the increased interest in multimedia databases, bioinformatics, or social networks, and with the growing number of users trying to find information in huge collections of unstructured data. During the exploration, the users handle database objects in different ways based on the utilized similarity models, ranging from simple to complex models. Efficient indexing techniques for similarity search are required especially for growing databases. In this paper, we study implementation possibilities of the recently announced theoretical framework SIMDEX, the task of which is to algorithmically explore a given similarity space and find possibilities for efficient indexing. Instead of a fixed set of indexing properties, such as metric space axioms, SIMDEX aims to seek for alternative properties that are valid in a particular similarity model (database) and, at the same time, provide efficient indexing. In particular, we propose to implement the fundamental parts of SIMDEX by means of the genetic programming (GP) which we expect will provide highquality resulting set of expressions (axioms) useful for indexing.
منابع مشابه
Universal Indexing of Arbitrary Similarity Models
The increasing amount of available unstructured content together with the growing number of large non-relational databases put more emphasis on the content-based retrieval and precisely on the area of similarity searching. Although there exist several indexing methods for efficient querying, not all of them are best-suited for arbitrary similarity models. Having a metric space, we can easily ap...
متن کاملOptimizing Hashing Functions for Similarity Indexing in Arbitrary Metric and Nonmetric Spaces
A large number of methods have been proposed for similarity indexing in Euclidean spaces, and several such methods can also be used in arbitrary metric spaces. Such methods exploit specific properties of Euclidean spaces or general metric spaces. Designing generalpurpose similarity indexing methods for arbitrary metric and non-metric distance measures is a more difficult problem, due to the vas...
متن کاملEfficient Similarity Search for Time Series Data Based on the Minimum Distance
We address the problem of efficient similarity search based on the minimum distance in large time series databases. Most of previous work is focused on similarity matching and retrieval of time series based on the Euclidean distance. However, as we demonstrate in this paper, the Euclidean distance has limitations as a similarity measurement. It is sensitive to the absolute offsets of time seque...
متن کاملShock-Based Indexing into Large Shape Databases
This paper examines issues arising in applying a previously developed edit-distance shock graph matching technique to indexing into large shape databases. This approach compares the shock graph topology and attributes to produce a similarity metric, and results in 100% recognition rate in querying a database of approximately 200 shapes. However, indexing into a significantly larger database is ...
متن کاملHierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes
Set-valued attributes are convenient to model complex objects occurring in the real world. Currently available database systems support the storage of set-valued attributes in relational tables but contain no primitives to query them efficiently. Queries involving set-valued attributes either perform full scans of the source data or make multiple passes over single-value indexes to reduce the n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013